Overview

Dataset statistics

Number of variables14
Number of observations636262
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory55.8 MiB
Average record size in memory92.0 B

Variable types

Numeric9
Categorical5

Warnings

name_orig has a high cardinality: 636170 distinct values High cardinality
name_dest has a high cardinality: 457297 distinct values High cardinality
step is highly correlated with daysHigh correlation
amount is highly correlated with dif_balance_destHigh correlation
old_balance_org is highly correlated with new_balance_origHigh correlation
new_balance_orig is highly correlated with old_balance_orgHigh correlation
old_balance_dest is highly correlated with new_balance_destHigh correlation
new_balance_dest is highly correlated with old_balance_destHigh correlation
days is highly correlated with stepHigh correlation
dif_balance_dest is highly correlated with amountHigh correlation
step is highly correlated with daysHigh correlation
amount is highly correlated with old_balance_dest and 1 other fieldsHigh correlation
old_balance_org is highly correlated with new_balance_origHigh correlation
new_balance_orig is highly correlated with old_balance_org and 1 other fieldsHigh correlation
old_balance_dest is highly correlated with amount and 1 other fieldsHigh correlation
new_balance_dest is highly correlated with amount and 1 other fieldsHigh correlation
days is highly correlated with stepHigh correlation
dif_balance_orig is highly correlated with dif_balance_destHigh correlation
dif_balance_dest is highly correlated with new_balance_orig and 1 other fieldsHigh correlation
step is highly correlated with daysHigh correlation
old_balance_org is highly correlated with new_balance_origHigh correlation
new_balance_orig is highly correlated with old_balance_org and 1 other fieldsHigh correlation
old_balance_dest is highly correlated with new_balance_destHigh correlation
new_balance_dest is highly correlated with old_balance_destHigh correlation
days is highly correlated with stepHigh correlation
dif_balance_dest is highly correlated with new_balance_origHigh correlation
dif_balance_orig is highly correlated with is_fraudHigh correlation
amount is highly correlated with new_balance_dest and 1 other fieldsHigh correlation
old_balance_org is highly correlated with new_balance_origHigh correlation
old_balance_dest is highly correlated with new_balance_destHigh correlation
is_fraud is highly correlated with dif_balance_origHigh correlation
new_balance_dest is highly correlated with amount and 2 other fieldsHigh correlation
days is highly correlated with stepHigh correlation
new_balance_orig is highly correlated with old_balance_orgHigh correlation
step is highly correlated with daysHigh correlation
dif_balance_dest is highly correlated with amount and 1 other fieldsHigh correlation
amount is highly skewed (γ1 = 30.74367403) Skewed
old_balance_dest is highly skewed (γ1 = 21.97917021) Skewed
new_balance_dest is highly skewed (γ1 = 21.22636428) Skewed
dif_balance_orig is highly skewed (γ1 = -26.86841927) Skewed
dif_balance_dest is highly skewed (γ1 = 32.5985827) Skewed
name_orig is uniformly distributed Uniform
name_dest is uniformly distributed Uniform
old_balance_org has 210285 (33.1%) zeros Zeros
new_balance_orig has 361277 (56.8%) zeros Zeros
old_balance_dest has 270408 (42.5%) zeros Zeros
new_balance_dest has 243933 (38.3%) zeros Zeros
dif_balance_orig has 208941 (32.8%) zeros Zeros
dif_balance_dest has 231697 (36.4%) zeros Zeros

Reproduction

Analysis started2021-08-18 17:03:49.762913
Analysis finished2021-08-18 17:04:58.163471
Duration1 minute and 8.4 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

step
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct629
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243.6019501
Minimum1
Maximum743
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-08-18T14:04:58.268909image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q1156
median241
Q3335
95-th percentile492
Maximum743
Range742
Interquartile range (IQR)179

Descriptive statistics

Standard deviation142.3980298
Coefficient of variation (CV)0.5845520928
Kurtosis0.3160876985
Mean243.6019501
Median Absolute Deviation (MAD)90
Skewness0.3691862978
Sum154994664
Variance20277.19888
MonotonicityNot monotonic
2021-08-18T14:04:58.400347image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
195214
 
0.8%
1874976
 
0.8%
184968
 
0.8%
3074701
 
0.7%
2354674
 
0.7%
1634627
 
0.7%
1394560
 
0.7%
3554531
 
0.7%
4034482
 
0.7%
2594480
 
0.7%
Other values (619)589049
92.6%
ValueCountFrequency (%)
1262
 
< 0.1%
2103
 
< 0.1%
352
 
< 0.1%
453
 
< 0.1%
569
 
< 0.1%
6176
 
< 0.1%
7666
 
0.1%
82156
0.3%
93751
0.6%
103614
0.6%
ValueCountFrequency (%)
7432
< 0.1%
7412
< 0.1%
7401
< 0.1%
7392
< 0.1%
7372
< 0.1%
7362
< 0.1%
7351
< 0.1%
7342
< 0.1%
7331
< 0.1%
7312
< 0.1%

type
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
CASH_OUT
223667 
PAYMENT
215258 
CASH_IN
140099 
TRANSFER
53153 
DEBIT
 
4085

Length

Max length8
Median length7
Mean length7.422231722
Min length5

Characters and Unicode

Total characters4722484
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCASH_OUT
2nd rowCASH_OUT
3rd rowTRANSFER
4th rowPAYMENT
5th rowPAYMENT

Common Values

ValueCountFrequency (%)
CASH_OUT223667
35.2%
PAYMENT215258
33.8%
CASH_IN140099
22.0%
TRANSFER53153
 
8.4%
DEBIT4085
 
0.6%

Length

2021-08-18T14:04:58.634946image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-18T14:04:58.712310image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
cash_out223667
35.2%
payment215258
33.8%
cash_in140099
22.0%
transfer53153
 
8.4%
debit4085
 
0.6%

Most occurring characters

ValueCountFrequency (%)
A632177
13.4%
T496163
10.5%
S416919
8.8%
N408510
8.7%
C363766
 
7.7%
H363766
 
7.7%
_363766
 
7.7%
E272496
 
5.8%
O223667
 
4.7%
U223667
 
4.7%
Other values (8)957587
20.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4358718
92.3%
Connector Punctuation363766
 
7.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A632177
14.5%
T496163
11.4%
S416919
9.6%
N408510
9.4%
C363766
8.3%
H363766
8.3%
E272496
 
6.3%
O223667
 
5.1%
U223667
 
5.1%
P215258
 
4.9%
Other values (7)742329
17.0%
Connector Punctuation
ValueCountFrequency (%)
_363766
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4358718
92.3%
Common363766
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A632177
14.5%
T496163
11.4%
S416919
9.6%
N408510
9.4%
C363766
8.3%
H363766
8.3%
E272496
 
6.3%
O223667
 
5.1%
U223667
 
5.1%
P215258
 
4.9%
Other values (7)742329
17.0%
Common
ValueCountFrequency (%)
_363766
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4722484
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A632177
13.4%
T496163
10.5%
S416919
8.8%
N408510
8.7%
C363766
 
7.7%
H363766
 
7.7%
_363766
 
7.7%
E272496
 
5.8%
O223667
 
4.7%
U223667
 
4.7%
Other values (8)957587
20.3%

amount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct622517
Distinct (%)97.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180035.7065
Minimum0
Maximum69886731.3
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2021-08-18T14:04:58.800122image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2224.931
Q113429.5075
median74896.38
Q3208974.285
95-th percentile517876.791
Maximum69886731.3
Range69886731.3
Interquartile range (IQR)195544.7775

Descriptive statistics

Standard deviation600588.416
Coefficient of variation (CV)3.335940562
Kurtosis1757.615666
Mean180035.7065
Median Absolute Deviation (MAD)68391.61
Skewness30.74367403
Sum1.145498787 × 1011
Variance3.607064455 × 1011
MonotonicityNot monotonic
2021-08-18T14:04:59.066036image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000000323
 
0.1%
1500010
 
< 0.1%
50007
 
< 0.1%
1000005
 
< 0.1%
100005
 
< 0.1%
8155.634
 
< 0.1%
4374.974
 
< 0.1%
2870.034
 
< 0.1%
10180.184
 
< 0.1%
5320.174
 
< 0.1%
Other values (622507)635892
99.9%
ValueCountFrequency (%)
03
< 0.1%
0.011
 
< 0.1%
0.241
 
< 0.1%
0.411
 
< 0.1%
0.431
 
< 0.1%
0.511
 
< 0.1%
0.551
 
< 0.1%
0.581
 
< 0.1%
0.61
 
< 0.1%
0.681
 
< 0.1%
ValueCountFrequency (%)
69886731.31
< 0.1%
63294839.631
< 0.1%
51141938.171
< 0.1%
49522819.361
< 0.1%
47825067.551
< 0.1%
46874551.961
< 0.1%
46698160.511
< 0.1%
42684988.591
< 0.1%
41818052.21
< 0.1%
40891132.651
< 0.1%

name_orig
Categorical

HIGH CARDINALITY
UNIFORM

Distinct636170
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
C501544405
 
2
C1125240410
 
2
C491393722
 
2
C2121401286
 
2
C786304962
 
2
Other values (636165)
636252 

Length

Max length11
Median length11
Mean length10.48223216
Min length5

Characters and Unicode

Total characters6669446
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique636078 ?
Unique (%)> 99.9%

Sample

1st rowC1389413404
2nd rowC958468196
3rd rowC857481806
4th rowC558963849
5th rowC1644082954

Common Values

ValueCountFrequency (%)
C5015444052
 
< 0.1%
C11252404102
 
< 0.1%
C4913937222
 
< 0.1%
C21214012862
 
< 0.1%
C7863049622
 
< 0.1%
C6967919132
 
< 0.1%
C3316591942
 
< 0.1%
C18208717292
 
< 0.1%
C1365666052
 
< 0.1%
C4694437222
 
< 0.1%
Other values (636160)636242
> 99.9%

Length

2021-08-18T14:04:59.317136image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c10374250172
 
< 0.1%
c16335256352
 
< 0.1%
c3316591942
 
< 0.1%
c15379341792
 
< 0.1%
c2046852202
 
< 0.1%
c13023153692
 
< 0.1%
c2714833212
 
< 0.1%
c1423128112
 
< 0.1%
c7400293522
 
< 0.1%
c10871774682
 
< 0.1%
Other values (636160)636242
> 99.9%

Most occurring characters

ValueCountFrequency (%)
1880544
13.2%
C636262
9.5%
2613975
9.2%
4570056
8.5%
3569459
8.5%
7567600
8.5%
5567437
8.5%
8566583
8.5%
9566310
8.5%
0565624
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6033184
90.5%
Uppercase Letter636262
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1880544
14.6%
2613975
10.2%
4570056
9.4%
3569459
9.4%
7567600
9.4%
5567437
9.4%
8566583
9.4%
9566310
9.4%
0565624
9.4%
6565596
9.4%
Uppercase Letter
ValueCountFrequency (%)
C636262
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6033184
90.5%
Latin636262
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1880544
14.6%
2613975
10.2%
4570056
9.4%
3569459
9.4%
7567600
9.4%
5567437
9.4%
8566583
9.4%
9566310
9.4%
0565624
9.4%
6565596
9.4%
Latin
ValueCountFrequency (%)
C636262
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6669446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1880544
13.2%
C636262
9.5%
2613975
9.2%
4570056
8.5%
3569459
8.5%
7567600
8.5%
5567437
8.5%
8566583
8.5%
9566310
8.5%
0565624
8.5%

old_balance_org
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct258462
Distinct (%)40.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean838195.6819
Minimum0
Maximum59585040.37
Zeros210285
Zeros (%)33.1%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2021-08-18T14:04:59.433256image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median14057.34
Q3107363.405
95-th percentile5872246.466
Maximum59585040.37
Range59585040.37
Interquartile range (IQR)107363.405

Descriptive statistics

Standard deviation2900800.457
Coefficient of variation (CV)3.460767598
Kurtosis33.09617991
Mean838195.6819
Median Absolute Deviation (MAD)14057.34
Skewness5.248479964
Sum5.33312061 × 1011
Variance8.414643292 × 1012
MonotonicityNot monotonic
2021-08-18T14:04:59.542717image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0210285
33.1%
162120
 
< 0.1%
109110
 
< 0.1%
181102
 
< 0.1%
125100
 
< 0.1%
16199
 
< 0.1%
11199
 
< 0.1%
12098
 
< 0.1%
10698
 
< 0.1%
11597
 
< 0.1%
Other values (258452)425054
66.8%
ValueCountFrequency (%)
0210285
33.1%
0.441
 
< 0.1%
0.671
 
< 0.1%
129
 
< 0.1%
1.371
 
< 0.1%
241
 
< 0.1%
2.431
 
< 0.1%
328
 
< 0.1%
3.291
 
< 0.1%
423
 
< 0.1%
ValueCountFrequency (%)
59585040.371
< 0.1%
57316255.051
< 0.1%
37919816.481
< 0.1%
37538004.891
< 0.1%
35002864.381
< 0.1%
34953893.081
< 0.1%
34574029.941
< 0.1%
34266226.441
< 0.1%
33474678.331
< 0.1%
33468608.081
< 0.1%

new_balance_orig
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct274235
Distinct (%)43.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean859357.6652
Minimum0
Maximum49585040.37
Zeros361277
Zeros (%)56.8%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2021-08-18T14:04:59.667707image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3144824.6175
95-th percentile6030913.765
Maximum49585040.37
Range49585040.37
Interquartile range (IQR)144824.6175

Descriptive statistics

Standard deviation2935834.207
Coefficient of variation (CV)3.416312353
Kurtosis32.03814074
Mean859357.6652
Median Absolute Deviation (MAD)0
Skewness5.171901225
Sum5.467766268 × 1011
Variance8.619122494 × 1012
MonotonicityNot monotonic
2021-08-18T14:04:59.787161image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0361277
56.8%
72903.913
 
< 0.1%
40563.813
 
< 0.1%
4278.093
 
< 0.1%
5329.293
 
< 0.1%
218648.272
 
< 0.1%
1278.352
 
< 0.1%
131722.052
 
< 0.1%
2674.742
 
< 0.1%
6566.232
 
< 0.1%
Other values (274225)274963
43.2%
ValueCountFrequency (%)
0361277
56.8%
0.051
 
< 0.1%
0.51
 
< 0.1%
1.141
 
< 0.1%
1.241
 
< 0.1%
1.451
 
< 0.1%
1.521
 
< 0.1%
1.631
 
< 0.1%
1.871
 
< 0.1%
2.011
 
< 0.1%
ValueCountFrequency (%)
49585040.371
< 0.1%
47316255.051
< 0.1%
37950093.251
< 0.1%
37919816.481
< 0.1%
35017380.831
< 0.1%
34499396.721
< 0.1%
34348398.131
< 0.1%
33623300.061
< 0.1%
33576123.851
< 0.1%
33512987.051
< 0.1%

name_dest
Categorical

HIGH CARDINALITY
UNIFORM

Distinct457297
Distinct (%)71.9%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
C248609774
 
16
C1590550415
 
15
C2011561834
 
15
C2083562754
 
14
C1163480574
 
13
Other values (457292)
636189 

Length

Max length11
Median length11
Mean length10.48172294
Min length4

Characters and Unicode

Total characters6669122
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique358591 ?
Unique (%)56.4%

Sample

1st rowC819390946
2nd rowC257205272
3rd rowC134214261
4th rowM635090135
5th rowM332145827

Common Values

ValueCountFrequency (%)
C24860977416
 
< 0.1%
C159055041515
 
< 0.1%
C201156183415
 
< 0.1%
C208356275414
 
< 0.1%
C116348057413
 
< 0.1%
C195475174813
 
< 0.1%
C38113281013
 
< 0.1%
C172290238413
 
< 0.1%
C178786864813
 
< 0.1%
C92926621213
 
< 0.1%
Other values (457287)636124
> 99.9%

Length

2021-08-18T14:05:00.051630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c24860977416
 
< 0.1%
c201156183415
 
< 0.1%
c159055041515
 
< 0.1%
c208356275414
 
< 0.1%
c178786864813
 
< 0.1%
c38113281013
 
< 0.1%
c116348057413
 
< 0.1%
c172290238413
 
< 0.1%
c188033292913
 
< 0.1%
c92926621213
 
< 0.1%
Other values (457287)636124
> 99.9%

Most occurring characters

ValueCountFrequency (%)
1879498
13.2%
2611481
9.2%
3570904
8.6%
4568937
8.5%
0567669
8.5%
6567152
8.5%
8566952
8.5%
9566926
8.5%
5566692
8.5%
7566649
8.5%
Other values (2)636262
9.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6032860
90.5%
Uppercase Letter636262
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1879498
14.6%
2611481
10.1%
3570904
9.5%
4568937
9.4%
0567669
9.4%
6567152
9.4%
8566952
9.4%
9566926
9.4%
5566692
9.4%
7566649
9.4%
Uppercase Letter
ValueCountFrequency (%)
C421004
66.2%
M215258
33.8%

Most occurring scripts

ValueCountFrequency (%)
Common6032860
90.5%
Latin636262
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1879498
14.6%
2611481
10.1%
3570904
9.5%
4568937
9.4%
0567669
9.4%
6567152
9.4%
8566952
9.4%
9566926
9.4%
5566692
9.4%
7566649
9.4%
Latin
ValueCountFrequency (%)
C421004
66.2%
M215258
33.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6669122
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1879498
13.2%
2611481
9.2%
3570904
8.6%
4568937
8.5%
0567669
8.5%
6567152
8.5%
8566952
8.5%
9566926
8.5%
5566692
8.5%
7566649
8.5%
Other values (2)636262
9.5%

old_balance_dest
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct365217
Distinct (%)57.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1106841.08
Minimum0
Maximum355381433.6
Zeros270408
Zeros (%)42.5%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2021-08-18T14:05:00.152162image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median132774.34
Q3947431.27
95-th percentile5150751.782
Maximum355381433.6
Range355381433.6
Interquartile range (IQR)947431.27

Descriptive statistics

Standard deviation3486732.494
Coefficient of variation (CV)3.15016542
Kurtosis1164.213696
Mean1106841.08
Median Absolute Deviation (MAD)132774.34
Skewness21.97917021
Sum7.042409189 × 1011
Variance1.215730349 × 1013
MonotonicityNot monotonic
2021-08-18T14:05:00.294962image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0270408
42.5%
1000000062
 
< 0.1%
2000000024
 
< 0.1%
400000008
 
< 0.1%
300000006
 
< 0.1%
1745
 
< 0.1%
1844
 
< 0.1%
60014
 
< 0.1%
3474
 
< 0.1%
1304
 
< 0.1%
Other values (365207)365733
57.5%
ValueCountFrequency (%)
0270408
42.5%
0.791
 
< 0.1%
1.641
 
< 0.1%
2.51
 
< 0.1%
42
 
< 0.1%
4.521
 
< 0.1%
51
 
< 0.1%
91
 
< 0.1%
12.11
 
< 0.1%
12.951
 
< 0.1%
ValueCountFrequency (%)
355381433.61
< 0.1%
355185537.11
< 0.1%
311404901.41
< 0.1%
301140972.51
< 0.1%
286606840.41
< 0.1%
257660877.11
< 0.1%
235879668.41
< 0.1%
235855760.31
< 0.1%
219050020.81
< 0.1%
210599258.71
< 0.1%

new_balance_dest
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct387378
Distinct (%)60.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1230661.224
Minimum0
Maximum355380483.5
Zeros243933
Zeros (%)38.3%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2021-08-18T14:05:00.425497image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median215337.58
Q31114869.022
95-th percentile5535900.888
Maximum355380483.5
Range355380483.5
Interquartile range (IQR)1114869.022

Descriptive statistics

Standard deviation3754355.69
Coefficient of variation (CV)3.050681711
Kurtosis1048.216962
Mean1230661.224
Median Absolute Deviation (MAD)215337.58
Skewness21.22636428
Sum7.830229719 × 1011
Variance1.409518665 × 1013
MonotonicityNot monotonic
2021-08-18T14:05:00.538884image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0243933
38.3%
100000008
 
< 0.1%
18123750.856
 
< 0.1%
4310701.916
 
< 0.1%
19169204.935
 
< 0.1%
971418.915
 
< 0.1%
1983599.055
 
< 0.1%
2364068.614
 
< 0.1%
2107778.114
 
< 0.1%
1092234.244
 
< 0.1%
Other values (387368)392282
61.7%
ValueCountFrequency (%)
0243933
38.3%
2.761
 
< 0.1%
3.051
 
< 0.1%
8.891
 
< 0.1%
10.581
 
< 0.1%
12.821
 
< 0.1%
15.321
 
< 0.1%
22.471
 
< 0.1%
231
 
< 0.1%
23.841
 
< 0.1%
ValueCountFrequency (%)
355380483.51
< 0.1%
355185537.11
< 0.1%
311492902.81
< 0.1%
311404901.41
< 0.1%
301248443.81
< 0.1%
300985927.91
< 0.1%
258034674.31
< 0.1%
249370946.21
< 0.1%
235932693.81
< 0.1%
221425442.21
< 0.1%

is_flagged_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
0
636258 
1
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters636262
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0636258
> 99.9%
14
 
< 0.1%

Length

2021-08-18T14:05:00.724853image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-18T14:05:00.771718image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0636258
> 99.9%
14
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0636258
> 99.9%
14
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number636262
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0636258
> 99.9%
14
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common636262
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0636258
> 99.9%
14
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII636262
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0636258
> 99.9%
14
 
< 0.1%

is_fraud
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
0
635441 
1
 
821

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters636262
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0635441
99.9%
1821
 
0.1%

Length

2021-08-18T14:05:00.930603image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-18T14:05:00.998366image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0635441
99.9%
1821
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0635441
99.9%
1821
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number636262
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0635441
99.9%
1821
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common636262
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0635441
99.9%
1821
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII636262
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0635441
99.9%
1821
 
0.1%

days
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct606
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.18464491
Minimum1
Maximum30.95833333
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2021-08-18T14:05:01.079542image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q16.5
median10.04166667
Q313.95833333
95-th percentile20.5
Maximum30.95833333
Range29.95833333
Interquartile range (IQR)7.458333333

Descriptive statistics

Standard deviation5.878241745
Coefficient of variation (CV)0.577167078
Kurtosis0.3364776649
Mean10.18464491
Median Absolute Deviation (MAD)3.75
Skewness0.4069573473
Sum6480102.542
Variance34.55372601
MonotonicityNot monotonic
2021-08-18T14:05:01.206544image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
157562
 
9.0%
7.7916666674976
 
0.8%
12.791666674701
 
0.7%
9.7916666674674
 
0.7%
6.7916666674627
 
0.7%
5.7916666674560
 
0.7%
14.791666674531
 
0.7%
16.791666674482
 
0.7%
10.791666674480
 
0.7%
1.7916666674466
 
0.7%
Other values (596)537203
84.4%
ValueCountFrequency (%)
157562
9.0%
1.041666667154
 
< 0.1%
1.08333333343
 
< 0.1%
1.1254
 
< 0.1%
1.2916666671
 
< 0.1%
1.3333333333
 
< 0.1%
1.3752330
 
0.4%
1.4166666673149
 
0.5%
1.4583333332902
 
0.5%
1.53956
 
0.6%
ValueCountFrequency (%)
30.958333332
< 0.1%
30.8752
< 0.1%
30.833333331
< 0.1%
30.791666672
< 0.1%
30.708333332
< 0.1%
30.666666672
< 0.1%
30.6251
< 0.1%
30.583333332
< 0.1%
30.541666671
< 0.1%
30.458333332
< 0.1%

dif_balance_orig
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct342931
Distinct (%)53.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21161.98328
Minimum-10000000
Maximum1915267.9
Zeros208941
Zeros (%)32.8%
Negative287230
Negative (%)45.1%
Memory size4.9 MiB
2021-08-18T14:05:01.329915image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-10000000
5-th percentile-74279.9
Q1-10134
median0
Q30
95-th percentile252672.475
Maximum1915267.9
Range11915267.9
Interquartile range (IQR)10134

Descriptive statistics

Standard deviation154473.8823
Coefficient of variation (CV)7.299593815
Kurtosis1586.505489
Mean21161.98328
Median Absolute Deviation (MAD)7270.275
Skewness-26.86841927
Sum1.34645658 × 1010
Variance2.38621803 × 1010
MonotonicityNot monotonic
2021-08-18T14:05:01.446829image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0208941
32.8%
-16291
 
< 0.1%
-10991
 
< 0.1%
-11186
 
< 0.1%
-18185
 
< 0.1%
-12585
 
< 0.1%
-12784
 
< 0.1%
-11584
 
< 0.1%
-12082
 
< 0.1%
-17082
 
< 0.1%
Other values (342921)426551
67.0%
ValueCountFrequency (%)
-1000000040
< 0.1%
-100000002
 
< 0.1%
-9772559.351
 
< 0.1%
-9468064.051
 
< 0.1%
-9097679.751
 
< 0.1%
-8994286.691
 
< 0.1%
-8991625.831
 
< 0.1%
-8980853.881
 
< 0.1%
-8950011.461
 
< 0.1%
-8924971.591
 
< 0.1%
ValueCountFrequency (%)
1915267.91
< 0.1%
1782621.491
< 0.1%
1394689.811
< 0.1%
1290876.41
< 0.1%
1240885.31
< 0.1%
1225177.331
< 0.1%
1208150.771
< 0.1%
1172012.841
< 0.1%
1162958.231
< 0.1%
1129849.781
< 0.1%

dif_balance_dest
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct404171
Distinct (%)63.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean123820.1448
Minimum-5353303.67
Maximum82704592.26
Zeros231697
Zeros (%)36.4%
Negative124229
Negative (%)19.5%
Memory size4.9 MiB
2021-08-18T14:05:01.572332image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-5353303.67
5-th percentile-222073.1895
Q10
median0
Q3148805.4575
95-th percentile559753.337
Maximum82704592.26
Range88057895.93
Interquartile range (IQR)148805.4575

Descriptive statistics

Standard deviation796750.2258
Coefficient of variation (CV)6.434738282
Kurtosis1749.045948
Mean123820.1448
Median Absolute Deviation (MAD)62685.34
Skewness32.5985827
Sum7.878205297 × 1010
Variance6.348109223 × 1011
MonotonicityNot monotonic
2021-08-18T14:05:01.682925image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0231697
36.4%
1000000019
 
< 0.1%
-150009
 
< 0.1%
-50006
 
< 0.1%
-1824
 
< 0.1%
-1000004
 
< 0.1%
-3473
 
< 0.1%
-100003
 
< 0.1%
-5003
 
< 0.1%
-3143
 
< 0.1%
Other values (404161)404511
63.6%
ValueCountFrequency (%)
-5353303.671
< 0.1%
-4525018.811
< 0.1%
-4381869.851
< 0.1%
-43042671
< 0.1%
-2199286.391
< 0.1%
-2116822.381
< 0.1%
-2014707.881
< 0.1%
-19841741
< 0.1%
-1952454.021
< 0.1%
-1915267.91
< 0.1%
ValueCountFrequency (%)
82704592.261
< 0.1%
75885725.631
< 0.1%
75549141.111
< 0.1%
69886731.31
< 0.1%
66308418.911
< 0.1%
64950673.261
< 0.1%
63294839.631
< 0.1%
56829381.31
< 0.1%
52403631.671
< 0.1%
51343097.271
< 0.1%

Interactions

2021-08-18T14:04:32.456945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:32.765712image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:33.057399image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:33.352249image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:33.639221image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:33.915417image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:34.247917image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:34.561412image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:34.824939image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:35.107019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:35.399668image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:35.694925image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:35.950454image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:36.202820image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:36.456963image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:36.736275image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:36.979013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:37.223145image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:37.490744image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:37.725082image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:37.976607image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:38.232064image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:38.479916image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:38.781909image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:39.239602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:39.533633image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:39.782518image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:40.142139image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:40.442546image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:40.717952image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:41.002792image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:41.281419image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:41.552640image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:41.862203image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:42.152548image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:42.481426image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:42.825816image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:43.133431image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:43.435572image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:43.729978image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:44.020980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:44.339467image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:44.664424image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:44.930788image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:45.201127image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:45.509922image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:45.794250image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:46.068155image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:46.380924image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:46.646499image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:46.909170image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:47.214505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:47.485903image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:47.758781image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:48.047756image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:48.464024image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:48.751370image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:49.033887image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:49.338184image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:49.637297image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:49.941305image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:50.232337image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:50.542500image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:50.842638image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:51.161315image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:51.485695image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:51.798090image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:52.084415image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:52.365701image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:52.674188image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:52.949592image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:53.208133image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:53.474109image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:53.744108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:53.994525image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:54.241817image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:54.526869image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:54.807228image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:55.098710image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:55.373142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-08-18T14:04:55.638745image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-08-18T14:05:01.781952image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-18T14:05:01.936889image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-18T14:05:02.104899image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-18T14:05:02.278914image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-08-18T14:05:02.414248image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-08-18T14:04:55.891181image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-18T14:04:56.562733image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

steptypeamountname_origold_balance_orgnew_balance_origname_destold_balance_destnew_balance_destis_flagged_fraudis_frauddaysdif_balance_origdif_balance_dest
0353CASH_OUT150540.160C13894134049912.0000.000C81939094629817.590180357.7500014.708-9912.000150540.160
1282CASH_OUT66723.640C9584681960.0000.000C2572052721136277.8101203001.4500011.7500.00066723.640
2228TRANSFER1039375.010C8574818062328.0000.000C134214261437583.3301476958.340009.500-2328.0001039375.010
336PAYMENT9178.610C55896384996237.62087059.010M6350901350.0000.000001.500-9178.6100.000
448PAYMENT4527.240C164408295451925.00047397.760M3321458270.0000.000002.000-4527.2400.000
522CASH_OUT35951.010C159696506529533.0000.000C172756577912239732.30012275683.310001.000-29533.00035951.010
6330PAYMENT6738.520C8592331350.0000.000M10685385540.0000.0000013.7500.0000.000
7228PAYMENT21206.810C153157294951453.00030246.190M11424640580.0000.000009.500-21206.8100.000
8207CASH_OUT296500.400C21076063130.0000.000C323320844622375.4804918875.880008.6250.000296500.400
9135CASH_OUT218014.960C207085598310510.0000.000C751256487657169.850875184.810005.625-10510.000218014.960

Last rows

steptypeamountname_origold_balance_orgnew_balance_origname_destold_balance_destnew_balance_destis_flagged_fraudis_frauddaysdif_balance_origdif_balance_dest
636252321PAYMENT871.120C15550270029290.00028418.880M6938530650.0000.0000013.375-871.1200.000
636253394CASH_IN220041.020C131070433761793.000981834.020C466352325433656.620213615.6100016.417220041.020-220041.010
636254353CASH_IN6286.100C2803505367226.00013512.100C810304970.0000.0000014.7086286.1000.000
636255372PAYMENT707.460C82052290150882.410150174.950M10384257510.0000.0000015.500-707.4600.000
636256189CASH_IN374394.760C3131761952975644.7903350039.550C18215029276544869.3606170474.600007.875374394.760-374394.760
636257691CASH_OUT184745.770C11937505871155.0000.000C1165106583186549.970371295.7400028.792-1155.000184745.770
63625821CASH_OUT78836.940C12938304430658.0000.000C16383130984642989.0204721825.960001.000-30658.00078836.940
63625916CASH_IN265300.000C129486182011400.000276700.000C20040706310.000488175.010001.000265300.000488175.010
636260204PAYMENT6747.460C11031348006467.7200.000M10701240170.0000.000008.500-6467.7200.000
636261154PAYMENT6865.630C10138567700.0000.000M17214194160.0000.000006.4170.0000.000